Terminology Extraction from the Baidu Encyclopedia

نویسنده

  • Bulat Fatkulin
چکیده

The article examines the use of the applied linguistics technologies in the teaching of orientalistics in the Russian Federation higher education system. The research discloses the methods of the terminological units extracting using texts in Chinese studies of the modern Afghanistan. The author shows the solutions for intensive summarization and annotation of Chinese texts and language teaching methods for students to work with assistive software. The achievements of the Stanford NLP group are used for the Chinese text segmentation and named entity recognition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-language and Cross-encyclopedia Article Linking Using Mixed-language Topic Model and Hypernym Translation

Creating cross-language article links among different online encyclopedias is now an important task in the unification of multilingual knowledge bases. In this paper, we propose a cross-language article linking method using a mixed-language topic model and hypernym translation features based on an SVM model to link English Wikipedia and Chinese Baidu Baike, the most widely used Wiki-like encycl...

متن کامل

ISCAS at Subtopic Mining Task in NTCIR9

In this paper, we describe our work at subtopic mining subtask in NTCIR-9 in simplified Chinese. To find possible subtopics of a specific query, we select related queries recorded by query log, or titles of searching results provided by Google and Baidu, or the catalog of corresponding entry in Baidu encyclopedia, which are lexically similar as the original query, then we apply k-means algorith...

متن کامل

Data Warehouse Back-End Tools

The back-end tools of a data warehouse are pieces of software responsible for the extraction of data from several sources, their cleansing, customization, and insertion into a data warehouse. They are known under the general term extraction, transformation and loading (ETL) tools. In all the phases of an ETL process (extraction and exportation, transformation and cleaning, and loading), individ...

متن کامل

Encyclolink: A Cross-Encyclopedia, Cross-language Article-Linking System and Web-based Search Interface

Cross-language article linking (CLAL) is the task of finding corresponding article pairs across encyclopedias of different languages. In this paper, we present Encyclolink, a web-based CLAL search interface designed to help users find equivalent encyclopedia articles in Baidu Baike for a given English Wikipedia article title query. Encyclolink is powered by our cross-encyclopedia entity embeddi...

متن کامل

Exploiting a Multilingual Web-based Encyclopedia for Bilingual Terminology Extraction

Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopedias such as Wikipedia as comparable corpora for bilingual terminology e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014